35 research outputs found
Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making
Today, AI is being increasingly used to help human experts make decisions in
high-stakes scenarios. In these scenarios, full automation is often
undesirable, not only due to the significance of the outcome, but also because
human experts can draw on their domain knowledge complementary to the model's
to ensure task success. We refer to these scenarios as AI-assisted decision
making, where the individual strengths of the human and the AI come together to
optimize the joint decision outcome. A key to their success is to appropriately
\textit{calibrate} human trust in the AI on a case-by-case basis; knowing when
to trust or distrust the AI allows the human expert to appropriately apply
their knowledge, improving decision outcomes in cases where the model is likely
to perform poorly. This research conducts a case study of AI-assisted decision
making in which humans and AI have comparable performance alone, and explores
whether features that reveal case-specific model information can calibrate
trust and improve the joint performance of the human and AI. Specifically, we
study the effect of showing confidence score and local explanation for a
particular prediction. Through two human experiments, we show that confidence
score can help calibrate people's trust in an AI model, but trust calibration
alone is not sufficient to improve AI-assisted decision making, which may also
depend on whether the human can bring in enough unique knowledge to complement
the AI's errors. We also highlight the problems in using local explanation for
AI-assisted decision making scenarios and invite the research community to
explore new approaches to explainability for calibrating human trust in AI
Visualizations for an Explainable Planning Agent
In this paper, we report on the visualization capabilities of an Explainable
AI Planning (XAIP) agent that can support human in the loop decision making.
Imposing transparency and explainability requirements on such agents is
especially important in order to establish trust and common ground with the
end-to-end automated planning system. Visualizing the agent's internal
decision-making processes is a crucial step towards achieving this. This may
include externalizing the "brain" of the agent -- starting from its sensory
inputs, to progressively higher order decisions made by it in order to drive
its planning components. We also show how the planner can bootstrap on the
latest techniques in explainable planning to cast plan visualization as a plan
explanation problem, and thus provide concise model-based visualization of its
plans. We demonstrate these functionalities in the context of the automated
planning components of a smart assistant in an instrumented meeting space.Comment: PREVIOUSLY Mr. Jones -- Towards a Proactive Smart Room Orchestrator
(appeared in AAAI 2017 Fall Symposium on Human-Agent Groups
Bootstrapping Conversational Agents With Weak Supervision
Many conversational agents in the market today follow a standard bot
development framework which requires training intent classifiers to recognize
user input. The need to create a proper set of training examples is often the
bottleneck in the development process. In many occasions agent developers have
access to historical chat logs that can provide a good quantity as well as
coverage of training examples. However, the cost of labeling them with tens to
hundreds of intents often prohibits taking full advantage of these chat logs.
In this paper, we present a framework called \textit{search, label, and
propagate} (SLP) for bootstrapping intents from existing chat logs using weak
supervision. The framework reduces hours to days of labeling effort down to
minutes of work by using a search engine to find examples, then relies on a
data programming approach to automatically expand the labels. We report on a
user study that shows positive user feedback for this new approach to build
conversational agents, and demonstrates the effectiveness of using data
programming for auto-labeling. While the system is developed for training
conversational agents, the framework has broader application in significantly
reducing labeling effort for training text classifiers.Comment: 6 pages, 3 figures, 1 table, Accepted for publication in IAAI 201
Adult Social Work and High Risk Domestic Violence Cases
Summary
This article focuses on adult social work’s response in England to high-risk domestic violence cases and the role of adult social workers in Multi-Agency Risk and Assessment Conferences. (MARACs). The research was undertaken between 2013-2014 and
focused on one city in England and involved the research team attending MARACs, Interviews with 20 adult social workers, 24 MARAC attendees, 14 adult service users at time T1 (including follow up interviews after six months, T2), focus groups with IDVAs and Women’s Aid and an interview with a Women’s Aid service user.
Findings
The findings suggest that although adult social workers accept the need to be involved in domestic violence cases they are uncertain of what their role is and are confused with the need to operate a parallel domestic violence and adult safeguarding approach, which is further, complicated by issues of mental capacity. MARACS are identified as overburdened, under-represented meetings staffed by committed managers. However, they are in danger of becoming managerial processes neglecting the service users they are meant to protect.
Applications
The article argues for a re-engagement of adult social workers with domestic violence that has increasingly become over identified with child protection. It also raises the issue whether MARACS remain fit for purpose and whether they still represent the best possible response to multi-agency coordination and practice in domestic violence
The Knee Clinical Assessment Study – CAS(K). A prospective study of knee pain and knee osteoarthritis in the general population: baseline recruitment and retention at 18 months
BACKGROUND: Selective non-participation at baseline (due to non-response and non-consent) and loss to follow-up are important concerns for longitudinal observational research. We investigated these matters in the context of baseline recruitment and retention at 18 months of participants for a prospective observational cohort study of knee pain and knee osteoarthritis in the general population. METHODS: Participants were recruited to the Knee Clinical Assessment Study – CAS(K) – by a multi-stage process involving response to two postal questionnaires, consent to further contact and medical record review (optional), and attendance at a research clinic. Follow-up at 18-months was by postal questionnaire. The characteristics of responders/consenters were described for each stage in the recruitment process to identify patterns of selective non-participation and loss to follow-up. The external validity of findings from the clinic attenders was tested by comparing the distribution of WOMAC scores and the association between physical function and obesity with the same parameters measured directly in the target population as whole. RESULTS: 3106 adults aged 50 years and over reporting knee pain in the previous 12 months were identified from the first baseline questionnaire. Of these, 819 consented to further contact, responded to the second questionnaire, and attended the research clinics. 776 were successfully followed up at 18 months. There was evidence of selective non-participation during recruitment (aged 80 years and over, lower socioeconomic group, currently in employment, experiencing anxiety or depression, brief episode of knee pain within the previous year). This did not cause significant bias in either the distribution of WOMAC scores or the association between physical function and obesity. CONCLUSION: Despite recruiting a minority of the target population to the research clinics and some evidence of selective non-participation, this appears not to have resulted in significant bias of cross-sectional estimates. The main effect of non-participation in the current cohort is likely to be a loss of precision in stratum-specific estimates e.g. in those aged 80 years and over. The subgroup of individuals who attended the research clinics and who make up the CAS(K) cohort can be used to accurately estimate parameters in the reference population as a whole. The potential for selection bias, however, remains an important consideration in each subsequent analysis